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Chapter  1 

Abstract 


The  research  conducted  with  AFOSR  support  under  Grant  #  AF /F49620- 
03-1-0120  aimed  at  developing  algorithms  and  theory  for  reliable  re¬ 
trieval  of  information  hidden  in  noisy  measurements. 

A  wide  range  of  technologies,  from  medical  diagnostics  to  reconnais¬ 
sance  and  targeting,  rely  critically  on  the  quality  of  available  data.  Our 
ability  to  control  is  often  only  hindered  by  our  ability  to  see.  The  data, 
whether  collected  using  ultrasound  transducers,  radar,  or  a  distributed 
array  of  sensors  have  one  thing  in  common,  the  useful  information  is 
often  swarnped  in  noise.  The  research  focused  on  developing  a  next 
generation  of  spectral  analysis  tools  and  resolution  standards  that  pro¬ 
vide  the  maximal  amount  of  useful  information  as  well  as  quantitative 
assessment  of  the  remaining  uncertainty. 

The  research  effort  has  generated  algorithms  that  have  a  built-in 
ability  to  focus  in  on  particular  futures  of  recorded  signals.  The  project 
has  undergone  several  phases  already.  Early  work,  in  collaboration  with 
Professors  C.I.  Byrnes  and  A.  Lindquist,  produced  a  method  and  appa¬ 
ratus  for  a  Tunable  High-REsolution  spectral  Estimator  (U.S.  patent 
No.  6,400,310)  nicknamed  THREE.  A  very  substantial  improvement 
in  resolution  over  prior  state-of-the  art  was  documented  via  theoretical 
as  well  as  experimental  studies.  A  number  of  specialized  algorithms 
spawned  from  this  and  have  since  been  tested  on  synthetic  aperture 
radar  imaging  using  MSTAR  data  and  on  ultrasound  imaging  in  col¬ 
laboration  with  Professor  E.  Ebbini  of  the  University  of  Minnesota. 

Recently,  in  collaboration  with  Dr.  Dan  Herrick  (AFRL/DESA), 
high  resolution  methods  axe  being  tailored  to,  disturbance  isolation  of 
a  targeting  system  (e.g.,  laser)  using  input  from  a  distributed  array  of 
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sensors.  High  resolution  methods  can  be  used  for  sifting  efficiently 
through  huge  amounts  of  data  so  as  to  identify  sources  and  direc¬ 
tionality  of  disturbances.  Thereby  it  is  anticipated  that  the  research 
completed  under  the  grant  will  further  contribute  to  critical  targeting 
technologies. 

Besides  the  development  of  algorithms,  the  research  team  has  in¬ 
vested  on  the  development  of  mathematical  tools  and  standards  for 
quantifying  resolution  and  reliability  in  spectral  analysis  and  in  esti¬ 
mation.  These  are  expected  to  guide  the  tuning  of  control  and  mea¬ 
surement  strategies  in  the  near  future. 
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Chapter  2 

Summary  of  Research 
Contributions 


The  research  conducted  under  the  present  grant  (#  AF/F49620-03-1- 
0120)  has  led  to  a  range  of  tools  for  robust  and  high  resolution  signal 
analysis.  Broadly,  these  tools  consist  of 

(i)  new  theoretical  results  for  signal  analysis, 

(ii)  numerical  algorithms  for  high  resolution  analysis,  and 

(iii)  quantitative  metrics  for  assessing  performance. 

Moreover,  under  the  present  grant,  application  studies  have  been  con¬ 
ducted  in  utilizing  these  techniques  in  the  context  of  sensor  networks, 
ultrasound  imaging,  and  synthetic  aperture  radar  imaging. 

Signal  processing  is  enabling  technology  for  a  broad  range  of  appli¬ 
cations,  from  medical  diagnostics  to  reconnaissance  and  communica¬ 
tions.  Advances  in  signal  analysis  techniques  often  have  an  immediate 
impact  on  the  state-of-the  art  in  such  application  areas.  The  focus  of 
the  research  has  been  an  apparent  need  for  robust,  flexible,  and  high 
resolution  analysis  tools.  The  hallmark  of  our  approach  has  been  the 
development  of  algorithms  with  a  built-in  ability  to  focus  in  on  partic¬ 
ular  futures  of  recorded  signals. 

The  approach  we  initiated  has  undergone  several  phases  already. 
Early  work,  in  collaboration  with  Professors  C.I.  Byrnes  and  A.  Lindquist, 
produced  a  method  and  apparatus  for  a  Tunable  High-REsolution  spec¬ 
tral  Estimator  (U.S.  patent  No.  6,400,310)  nicknamed  THREE,  which 
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has  been  especially  suitable  for  time-series  analysis  and  speech  anal¬ 
ysis.  A  number  of  specialized  algorithms  spawned  from  this.  These 
algorithms  allow  integrating  data  from  a  variety  of  sensors  with  ar¬ 
bitrary  geometry  and  originating  from  multivariable  time-series.  The 
ability  to  deal  with  arbitrary  sensor  geometries  and,  at  the  same  time, 
to  ensure  theoretical  performance  bounds,  represents  a  quantum  jump 
in  the  state-of-the  art. 

The  signal  analysis  tools  that  we  have  developeded  are  intended 
to  separate  frequency  components,  identify  deterministic  components 
from  random  ones,  and  to  detect  long  range  vs.  short  range  interac¬ 
tions.  The  components  obtained  in  such  an  analysis  stage,  help  identify 
a  wide  range  of  underlying  physical  causes,  and  dynamical  dependen¬ 
cies.  A  particular  advantage  of  these  new  methods  is  that  they  have 
a  built-in  mechanism  for  taking  into  account  prior  information  about 
the  processes  under  consideration.  On  the  application  side,  we  have 
explored  the  relevance  of  these  techniques  in  two  main  areas.  We  con¬ 
ducted  case  studies  focusing  on  measuring  the  temperature  of  (arti¬ 
ficial)  tissue  in  a  non-invasive  manner  by  analysing  ultrasound  echo. 
The  purpose  of  such  “non-invasive  sensing”  is  to  provide  reliable  mea¬ 
surement  of  tissue  temperature  for  computer  guided  tumor  ablation 
and  therapy  [31].  The  results  have  been  very  encouraging  and  sub¬ 
stantially  better  than  earlier  state-of-the  art.  We  have  also  explored 
the  use  of  such  high  resolution  techniques  in  synthetic  apperture  radar 
(SAR)  with  similar  results. 

A  most  significant  breakthrough  has  been  a  theory  for  analyzing  and 
integrating  data  from  distributed  sensor  arrays  (e.g.,  see  [9]).  Interest¬ 
ingly,  this  advancement  shares  the  same  basic  framework  with  our  tech¬ 
niques  for  high  resolution  spectral  analysis.  In  either,  we  seek  a  power 
distribution  which  is  consistent  with  measurements.  Pre-conditioning 
of  the  data  may  be  taken  into  account,  and  post-processing  can  be  tai¬ 
lored  to  the  task  at  hand  based  on  prior  information  (e.g.,  accounting 
for  known  portion  of  the  power  and  tuning  for  high  resolution  in  a 
particular  frequency  or  spatial  sector).  Identifying  power  distributions 
consistently  with  given  measurements  is  treated  as  an  inverse  prob¬ 
lem.  The  family  of  such  distributions  is  suitably  parametrized,  and 
the  size  of  the  family  represents  a  measure  of  uncertainty.  When  data 
are  collected  via  a  spatially  scattered  collection  of  sensors,  the  relevant 
imaging  and  sensor  analysis  problems  are  cast  as  moment  problems. 
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Typically,  data  and  measurements  represent  radar/sonar  echo,  a 
speech  recording,  etc.,  and  may  be  sampled  non-uniformly  with  gaps 
on  the  record.  Our  computational  theory  allows  solving  the  most  gen¬ 
eral  such  multivariable  and  multidimensional  moment  problems  by  pro¬ 
viding  representative  spectra,  as  well  as  a  complete  description  of  all 
spectra  which  are  consistent  with  the  data.  Besides  the  relevance  of 
these  techniques  in  analysis,  they  also  impact  on  the  design  and  optimal 
distribution  of  sensors. 

The  publications  listed  below  document  our  research  and  accom¬ 
plishments.  Our  joint  work  with  C.  Byrnes  and  A.  Lindquist  [3],  where 
foundations  of  our  framework  were  first  laid  out,  received  the  G.S.  Ax- 
elby  outstanding  paper  award  from  the  IEEE  Control  Systems  Society 
in  2003,  and  a  U.S.  patent  [41]  which  was  based  on  this  and  subsequent 
work.  We  mention  that  earlier  joint  work  of  the  PI  with  Professor  M.C. 
Smith  (University  of  Cambridge)  on  metrics  for  robust  control  analysis 
and  synthesis,  work  which  was  also  supported  by  AFOSR,  received  the 
G.S.  Axelby  outstanding  paper  award  twice.  First  in  1992,  for  a  robust 
control  theory  and  design  tools  for  linear  systems,  and  then  again  in 
1999  for  robust  control  theory  and  tools  for  nonlinear  systems.  The 
recent  work  under  the  current  grant,  on  robust  high  resolution  spectral 
analysis,  can  be  broadly  classified  into  the  following  categories  with 
some  overlaps. 

2.1  High  resolution  analysis  and  applications 

Publications:  [31,  36,  32,  11,  22,  6,  23,  8,  13,  14,  15,  34,  30,  17,  18,  40,  37] 

Our  framework  for  spectral  analysis  was  initiated  in  [19,  20,  21].  It 
was  influenced  by  earlier  joint  collaborative  wx>rk  of  the  PI  with  Chris 
Byrnes  and  Anders  Lindquist  in  [3,  4].  This  also  led  to  a  U.S.  patent 
[41]  on  tunable  high-resolution  spectral  estimators.  Publication  [32] 
explores  basic  tradeoffs  between  resolution  and  robustness  of  such  es¬ 
timators,  and  outlines  how  to  tune  these  for  optimal  performance.  In 
[36,  37,  40,  35]  we  explain  advantages  of  the  new  techniques  and  in¬ 
sight  in  antenna  arrays,  SAR,  and  in  multi-rate  signal  processing.  In 
[31,  34]  we  demonstrated  the  use  of  our  new  methods  in  non-invasive 
ultrasound  temperature  sensing.  This  work  is  on-going.  The  goal  is 
to  use  non-invasive  sensing  for  computer  controlled  tumor  ablation.  In 
[13,  14,  15]  we  pointed  out  that  a  key  step  in  spectral  analysis,  the  step 
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of  estimating  statistics,  may  not  provide  data  which  are  consistent  with 
underlying  dynamics.  In  this  case,  resolution  can  be  dramatically  im¬ 
proved  if  care  is  taken  to  adjust  the  statistics  so  as  to  conform  with 
known  underlying  dynamics  (in  a  way  which  extends  the  celebrated 
contribution  by  Burg  [27]  to  the  case  of  multivariable  processes).  Pub¬ 
lications  [30,  23,  11,  22,  18]  deal  with  assessing  the  level  of  spectral 
uncertainty,  and  then  presenting  canonical  decompositions  for  use  in 
spectral  analysis  problems. 

2.2  Multi- variable  Sz  multi-dimensional  moments 

Publications:  [9,  11,  22,  8,  12,  13,  14,  17] 

In  these  publications  we  solve  the  most  general  multi-variable  and 
multi-dimensional  moment  problem.  In  a  very  general  sense,  the  data 
for  modeling,  identification,  spectral  analysis,  etc.  amount  to  moment 
constraints  on  a  power  density  function  which  is  possibly  multivariable 
(matrix- valued)  and  multidimensional  (spatio-temporal).  Our  theory 
in  [9]  gives  a  way  to  determine  and  parametrize  all  consistent  distribu¬ 
tions.  Publications  [22,  11]  develop  further  a  very  important  “bound¬ 
ary”  case  of  singular  data  sets.  Publications  [12,  13,  14]  are  mostly  on 
a  static  but  multivariable  version  of  such  problems  and  corresponding 
numerical  issues. 

The  importance  of  this  stems  from  the  fact  that  it  encompasses  the 
most  general  situation  where  data  is  available  from  a  distributed  array 
of  sensors.  The  sensors  do  not  need  to  collect  the  data  in  synchro¬ 
nized  manner.  Also  the  data  may  originate  in  a  spacio-temporal  and 
multi-variable  distribution  of  scattered  power.  Our  framework  allows 
for  a  computational  theory  for  integrating  data  collected  by  such  dis¬ 
tributed  arrays  of  sensors  for  the  purpose  of  identifying  the  distribution 
of  power  and  properties  of  the  underlying  scattering  field.  Such  prob¬ 
lems  are  encountered  in  a  wide  range  of  applications  which  includes 
radar  technology,  ultrasound  imaging,  and  others. 

2.3  Analytic  interpolation  with  degree  constraint 

Publications:  [5,  22,  8,  9,  17] 

The  problem  of  analytic  interpolation  with  degree  constraint  was  in- 
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troduced  in  the  Pi’s  work  in  the  early  1980’s  (1983  Ph.D.  thesis).  Im¬ 
portant  contributions  by  Chris  Byrnes  and  Anders  Lindquist  re-kindled 
interest  in  the  problem,  and  in  the  joint  work  (together  with  A.  Megret- 
ski)  [5]  a  rather  complete  theory  for  the  (scalar  version)  of  the  problem 
was  finally  completed.  Interest  in  this  problem  stems  from  applica¬ 
tions  in  control  and  signal  processing.  The  theory  in  [22,  8,  9,  17]  is 
relevant  in  addressing  the  multivariable  version  of  the  problem  (which 
is  essential  for  multivariable  control  applications).  Work  on  the  multi- 
variable  problem  is  still  in  progress  along  the  lines  of  [9]  and  will  appear 
shortly.  This  work  entails  a  complete  parametrization  of  all  solutions 
to  a  multivariable  Nehari  type  of  interpolation  problem,  which  have  a 
Macmillan  degree  less  than  or  equal  to  the  generic  degree  prescribed 
by  the  problem  data. 

2.4  Feedback  control  design 

Publications:  [44,  10,  16,  39] 

Reference  [44]  deals  with  the  numerical  solution  of  certain  optimal  peri¬ 
odic  control  problems.  These  arose  in  studies  in  periodic  drug  delivery 
and  in  chemical  engineering  applications.  References  [10,  16,  39]  deal 
with  applying  our  theory  on  interpolation  with  degree  constraint  to 
controller  design  with  dimensionality  constraints.  Work  in  [10]  com¬ 
pares  our  framework  for  controller  design  with  degree  constraints  to  an 
alternative  which  is  based  on  solving  linear  matrix  inequalities. 

2.5  Metrics  for  spectral  uncertainty 

Publications:  [24,  25] 

Despite  the  centrality  of  spectral  analysis  in  a  wide  range  of  scientific 
disciplines,  no  agreement  exists  as  to  what  an  appropriate  distance 
measure  between  spectral  density  functions  is.  Some  of  the  key  con¬ 
tenders  have  been  the  Bregman  distances,  the  Kullback-Leibler-von 
Neumann  distance,  the  Itakura-Saito  distance,  and  finally  Battachar- 
rva  and  Mahalanobis-type  variants.  Certain  of  these  distances  have  a 
definite  relevance  when  used  to  discriminate  between  two  probability 
density  functions.  Yet  none  has  any  meaningful  interpretation  when 
applied  to  power  spectral  density  functions. 
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In  fact,  surprisingly,  the  question  on  how  to  quantify  uncertainty 
in  spectral  analysis  has  received  little  attention  in  the  past.  The  de¬ 
velopment  of  suitable  metrics  for  quantifying  distances  between  power 
spectra  has  been  a  focus  in  our  research.  These  metrics  are  essentials 
tool  for  assessing  uncertainty,  robustness,  and  performance  of  signal 
analysis  techniques.  Our  formalism  mimics  ideas  from  Information  ge¬ 
ometry  which  was  invented  so  as  to  quantify  uncertainty  in  statistical 
inference.  Information  geometry  endows  probability  distributions  with 
a  natural  (Fisher  infromation)  metric.  We  similarly  endow  density 
functions  with  a  metric  with  a  natural  interpretation  based  on  pre¬ 
diction  theory.  We  developed  new  distance  measure  between  power 
spectral  densities  [24,  25].  These  metrics  are  the  first  with  any  clear 
interpretation.  The  relevant  notion  of  distance  quantifies  differences 
in  predictability  properties  between  respective  random  processes.  It 
can  be  used  to  detect  subtle  changes  in  the  spectral  content  of  nonsta¬ 
tionary  signals  and,  thereby,  effectively  identify  drifts,  transitions,  and 
events. 


Chapter  3 

Technical  Highlights 


The  subject  of  our  research  relates  to  modeling  and  identification  of 
multivariable  and  multidimensional  time-series  with  a  focus  on  sensor 
arrays.  The  focus  has  been  an  apparent  need  for  robust,  flexible,  and 
high  resolution  analysis  tools.  The  hallmark  of  our  approach  has  been 
the  development  of  algorithms  with  a  built-in  ability  to  focus  in  on  par¬ 
ticular  futures  of  recorded  signals.  The  intended  applications  include 
identification,  array  signal  processing,  and  data  mining. 

In  Section  3.1,  we  overview  methods  for  multivariable  spectral  anal¬ 
ysis  based  on  generalized  statistics.  These  inherit  superior  resolution 
properties  from  utilizing  generalized  statistics  and  represent  a  gener¬ 
alization  of  our  earlier  Tunable  High  REsolution  Estimator  (THREE) 
reported  in  [4].  We  discuss  additive  decompositions  of  covariance  statis¬ 
tics  and  their  significance  in  identification,  and  outline  basic  theory 
which  underlies  our  advances  on  high  resolution  identification  and  spec¬ 
tral  analysis  [3,  4,  6,  7,  8,  9,  19,  21,  22,  31,  32]. 

In  the  backdrop  of  robust  control  and  of  the  gap  metric  (introduced 
by  Zames  and  El-Sakkary  and  developed  during  the  past  decade  by 
the  PI  in  collaboration  with  M.C.  Smith),  the  PI  seeks  quantitative 
measures  of  signal  affinity  as  well  as  objective  distances  between  signal 
statistics.  Traditional  vector  metrics  (e.g.,  Euclidean)  are  not  satisfac¬ 
tory  as  distances  between  statistical  quantities  since  they  do  not  ac¬ 
knowledge  their  structure  as  a  positive  cone.  Alternative  entropy-based 
metrics  are  sought  and  a  natural  intrinsic  metric  between  spectral  den¬ 
sity  functions  is  discussed  in  Section  3.2.  The  purpose  is  to  quantify 
uncertainty  and  resolution  in  signal  analysis,  to  assess  robustness  of 
signal  processing  algorithms,  and  to  facilitate  correlation  analysis  in 
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large  databases. 

Part  of  the  motivation  and  impetus  for  the  research  has  been  pro¬ 
vided  by  our  recent  collaborative  work  on  two  fronts:  Non-invasive 
sensing  for  medical  applications,  and  distributed  sensing  and  control  for 
vibration  isolation.  Measuring  heart  conditions  such  as  mitral  regurgi¬ 
tation  or  measuring  temperature  inside  tissue  for  purposes  of  computer 
guided  tumor  ablation  or  therapy,  in  a  non-invasive  manner,  requires 
exceedingly  high  resolution  and  robustness  (see  e.g.,  31]) .  Another 
ubiquitous  situation  arises  when  a  large  collection  of  signals  becomes 
available  (e.g.,  via  a  distributed  sensor  array  or  in  a  large  database) 
and  fast  and  high  resolution  correlation  analysis  is  sought  to  identify 
affinity,  coherence,  and  relevance  of  various  signals  Such  is  the  task 
when  we  seek  to  identify  sources  of  excitations  in  distributed  media,  the 
origin  and  pathway  of  disturbances  in  spatial  structures,  for  the  pur¬ 
poses  of  modeling/prediction/control  of  distributed  physical  systems 
— a  model  application  being  the  disturbance  isolation  of  a  targeting 
system  (e.g.,  a  laser)  using  feedback  from  measurements  collected  us¬ 
ing  a  distributed  sensor  array.  Similar  tasks  are  pertinent  in  signal 
classification  and  system  identification,  in  general. 

On  the  technical  side,  the  research  reported  herein  has  led  to  re¬ 
liable  methods  for  high  resolution  spectral  analysis  of  multivariable 
processes,  as  well  as  to  distance  measures  for  quantitative  assessment 
of  uncertainty  and  of  resolution  in  signal  analysis  applications.  In  the 
last  section  3.3,  as  an  example,  we  outline  certain  Matlab-based  rou¬ 
tines  and  their  use  for  high  resolution  analysis  of  scalar  time-series. 
There  routines  are  available  the  PPs  website. 


3.1  Generalized  statistics  &  resolution 

Our  first  step  is  to  explain  what  is  meant  by  “generalized  statistics’  and 
why  is  this  concept  important.  Consider  a  stationary  random  process 
{u/J  with  zero  mean  and  power  spectral  density  fu{9).  The  autocor¬ 
relation  samples  Rk  =  £{uiui+k]  of  Uk  arc  the  Fourier  coefficients  of 


/u(0),  i  e., 


for  A;  =  0,  ±1,  ±2, . . . , 
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R-k  =  R*k ,  while  fu(0)  is  given  by  the  Fourier  series  YlkL-oo  Rke~jk6- 
Occasionally,  {uk}  is  not  directly  observable  in  which  case  one  may 
not  be  able  to  estimate  autocorrelation  samples.  For  instance,  if  xk  = 
axk- 1  +  uk  —  buk~ i  is  a  first-order  system  (— 1  <  a  <  1)  and  if  only 
{xk}  is  available,  then  it  is  natural  to  estimate  statistics  of  {x^}  in¬ 
stead.  These  statistics  represent  moments  of  fu{9)  with  respect  to 
kernel  functions  which  differ  from  e*k$.  For  example,  the  variance  of 


k  > 


1 2fu(9)dd 


is  a  moment  of  fu{9)  with  respect  to  the  kernel  function  \(ej0 ~b)/(ej0  — 
a) |2.  Such  filtering  may  inherently  be  part  of  a  measuring  apparatus, 
but  it  may  also  be  introduced  to  improve  S/R  and  resolution  as  dis¬ 
cussed  in  e.g,  [32,  20]. 

Thus,  in  general,  it  is  customary  to  refer  to  any  moments  of  fu{0) 
as  generalized  statistics  of  the  underlying  random  process.  Not  all  such 
moments  originate  in  ordinary  time-filtermg,  and  not  all  correspond  to 
rational  kernel  functions.  In  fact,  a  most  challenging  and  very  com¬ 
mon  situation  arises  when  the  indexing  in  {uk}  refers  to  space  and 
not  time.  Take  for  instance  an  array  of  sensors  with  three  elements, 
linearly  spaced  at  distances  1  and  \/2  wavelengths  from  one  another, 
and  assume  that  (monochromatic)  planar  waves,  originating  from  afar, 
impinge  upon  the  array.  This  is  exemplified  in  Figure  3.1.  Assuming 

Eo  E\  J?2 

o  o  o 


that  the  sensors  are  sensitive  to  disturbances  originating  over  one  side 
of  the  array,  with  sensitivity  independent  of  direction,  the  signal  at  the 


16 


CHAPTER  3.  TECHNICAL  HIGHLIGHTS 


£th  sensor  is  typically  represented  as  a  superposition 

A(d)e^^t~pxtCos^+^e^d6, 

of  waves  arising  from  all  spatial  directions  0  6  [0, 7r],  w'here  ui  is  the  an¬ 
gular  time-frequency  (as  opposed  to  “spatial” ),  the  distance  between 
the  £th  and  the  0th  sensor,  p  the  wavenumber,  and  A(6)  the  amplitude 
and  <p{6)  a  random  phase  of  the  0-component.  Typically,  the  phase 
< f>(6)  for  various  values  of  6  are  uncorrelated.  The  term  px(  cos(d)  in 
the  exponent  accounts  for  the  phase  difference  between  reception  at 
different  sensors.  For  simplicity  we  assume  that  p  —  1  in  appropriate 
units.  Correlating  the  sensor  outputs  we  obtain 

Rk  =  E{utlut2}:=  [\-^mde  (3.1) 

Jo 

where  f(6)  =  \A{8)\2  now  represents  power  density,  and  k  =  i\—li  with 
>  £2  and  belonging  to  {0, 1,  V2  +  1}  ( k  is  kept  as  a  “non-integer” 
index  in  Rk  from  mnemonic  purposes).  Thus, 

kel:=  {0,1,  A^+l}.  (3.2) 

The  only  significance  of  our  selection  of  distances  between  sensors,  that 
gave  rise  to  the  rather  unusual  indexing  set  (3.2),  is  to  underscore  that 
there  is  no  algebraic  dependence  between  the  kernel  functions 

j  g-jcos(e)^  e-j%/2cos(0),  e-j(v/2+l)cos(0) 

Even  more  challenging  situations  arise  when  (i)  the  kernel  functions 
represent  Green’s  functions  or  transfer  functions  in  a  general  spatial 
domain,  e.g.,  in  case  sensors  are  scattered  in  a  random  pattern  in  R3, 
and  (ii)  when  statistics  are  obtained  from  observations  which  are  non- 
equispaced  in  time  (also,  random  sampling). 

Let  us  revisit  the  situation  of  ordinary  autocorrelation  samples. 
Given  a  finite  observation  record  (u0,  . . . ,  u^}  we  typically  estimate 
a  finite  sequence  {i^o.  Ri,  . . . ,  Rn}  (e.g.,  via  sample  averaging).  The 
Toeplitz  matrix 


R* 

R—n 

Ri 

Ro  • 

.  .  R—(n— 1) 

.  Rn 

Rn-i  • 

..  Ro 

T  •= 

-in  •  — 
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is  nonnegative  definite  since  fu{6)  >  0  and  since 


Tn  =  T f^GWuioWydd. 


where 


G(eje)={  1  ej$  ...  e>ne  ]', 


and  denotes  transpose  while  denotes  complex  conjugate  trans¬ 
pose.  The  (column)  “Fourier  vector”  G{ej0)  is  referred  to  as  a  Fourier 
vector,  and  as  6  varies,  it  defines  a  curve  in  a  complex  space  which  is 
known  as  the  “array  manifold” .  Non-negativity  of  Tn  turns  out  to  be 
sufficient  for  the  existence  of  a  power  spectral  density  which  is  con¬ 
sistent  with  the  moments  in  Tn.  There  is  a  rather  rich  theory  on  how 
much  Tn  is  telling  us  about  the  power  spectum,  and  how  to  reconstruct 
representative  spectra  (maximum-entropy,  etc.)  which  are  consistent 
with  the  partial  sequence  of  autocorrelation  statistics.  This  goes  back 
to  the  theory  of  the  trigonometric  moment  problem  and  of  orthogonal 
polynomials,  and  forms  the  basis  of  the  so-called  “modern  nonlinear 
spectral  analysis  methods”  [27].  An  alternative  way  to  reconstruct 
/u(0),  based  on  Tny  is  the  periodogram/correlogram 


m  :=  — -—rG(e’eyTnG(e]e). 
n  4- 1 


(3.3) 


This  is  an  approximation  of  /u(0)  —  see  [38],  as  is  the  Capon  (or 
maximum-likelihood)  estimate  of  /u(0)  given  by 


/(*)  :=  ~  (<?(^)*(Tw)“1G(e>*))_1 . 


(3.4) 


Weighted  versions  of  the  autocorrelation  coefficients  can  be  used  in 
order  to  trade-off  resolution  with  robustness.  I.e.,  using  various  win¬ 
dowing  functions  Wk  (Hamming,  Kaiser,  etc.)  one  may  replace  Rk  with 
RkWk  in  the  above.  These  ideas  are  classical,  were  extensively  studied 
decades  ago,  and  remain  the  workhorse  of  signal  analysis  applications 
to  this  day.  Yet,  it  is  a  striking  fact  that  a  multivariable  version  of 
such  successful  tools  has  largely  been  absent  (i.e.,  a  periodogram-like 
method  for  inherently  multivariable  processes).  In  our  software  tools 
we  generalize  and  take  advantage  of  similar  ideas  in  the  context  of 
generalized  statistics,  replacing  the  state  covariance  R  by  Hadamard 
products  with  suitable  windowing  matrices. 
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A  further  striking  fact  is  that  the  corresponding  issues  when  G{e]6) 
is  not  an  ordinary  Fourier  vector,  have  not  been  studied  with  the  excep¬ 
tion  of  the  somewhat  ad-hoc  beamspace  techniques.  The  recent  work 
by  the  PI  (see  [9,  8,  40])  has  addressed  such  issues  on  a  firm  theoreti¬ 
cal  basis.  For  instance,  returning  to  the  example  of  the  non-equispace 
antenna  array  in  Figure  3.1  and  Rk  for  k  G  T  in  (3.1),  it  is  important 
to  determine  whether  estimated  values  for  the  moments  are  consistent 
with  the  geometry  of  the  array,  and  if  so  to  characterize  all  consistent 
power  spectra.  In  the  present  situation  (packaging  Rk  s  in  (3.1)  into  a 
matrix  and  setting  r  —  cos(d))  the  nonnegativity  of 


R:= 


1 

e~jT 

e-i^T 


/(cos  Hr)) 
\Zl  —  T2 


eJv^r]  dr 


which,  in  the  obvious  indexing  turns  out  to  be 


R  = 


R* 

A 

A/2+1 


^1  ^v/2+1 

Ro  Ry/\ 2  > 

Ry/2  Ro 


(3.5) 


is  only  a  necessary  condition.  The  fact  that  it  is  not  sufficient  (see  e.g., 
[7,  page  786])  motivated  our  recent  work  and  led  to  a  rather  complete 
answer /theory  documented  in  [9,  8]. 

We  now  highlight  some  of  the  important  findings.  First  we  deal 
with  the  case  where  the  uFourier  vector”  G(ej0)  is  replaced  by  the 
transfer  function  of  a  linear,  time-invariant,  discrete-time  (input-to- 
state)  dynamical  system 


xh  =  Ar*;-i  +  Buk)  with  k  G  Z, 

xk  being  the  state-vector  and  A,B  matrices  in  Rnxn  and  Knxm,  re¬ 
spectively.  Then  the  input- to-state  transfer  function  G{e^e)  =  (I  — 
ejdA)~lB  is  matricial  and  the  random  process  {uk}  vectorial.  If  uh  is 
white  noise  (with  covariance  matrix  Q  >  0),  then  it  is  well  known  and 
easy  to  see  that  the  state  covariance 

R  :=  £{xhx'k } 

satisfies  the  Lyapunov  equation  R  —  ARAf  =  BQB'.  However,  surpris¬ 
ingly,  the  case  where  uk  is  not  white  was  dealt  only  recently  (under 
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AFOSR  support  by  the  PI  in  t20,  21]).  The  correspondence  between 
R,  A,  B  and  input  power  spectra  fu  is  detailed  in  [20,  21,  22]  Briefly, 
a  state  covariance  for  the  above  system  satisfies 


rank 


R-ARA'  B 

B *  0 


=  rank 


0  B 
B*  0 


(3-6) 


where  0  is  the  zero  matrix  of  appropriate  dimension.  An  alternative 
characterization  amounts  to  the  solvability  of 


R  -  ARA!  =  BHf  +  HBf 


for  a  matrix  H  which  is  of  the  same  size  as  B.  Conversely,  provided 
R  satisfies  either  of  the  above  two  equivalent  conditions,  and  provided 
it  is  non-negative  definite,  there  exists  a  power  spectrum  for  a  candi¬ 
date  input  that  gives  rise  to  such  state-statistics  (this  was  shown  in 
[20]).  The  parametrization  of  all  consistent  power  spectra  and  related 
computational  issues  has  been  the  subject  of  [20,  21,  22].  The  relevant 
realization  theory  for  matricial  power  spectral  densities  amounts  to 
analytic  interpolation  with  positive-real  matricial  functions  and  thus, 
echoes  a  lot  of  the  usual  tools  and  constructions  in  H^- control  theory. 

The  motivation  for  considering  state-covariances  of  linear  systems, 
was  to  develop  a  theory  for  high  resolution  spectral  analysis  following 
[3,  4,  19].  Our  joint  work  with  C.  Byrnes  and  A.  Lindquist  led  to  a  U.S. 
patent  [41].  The  main  idea  in  [3]  arose  from  the  simple  observation  that 
the  autocorrelation  samples  of  a  time-series  correspond  to  interpolation 
conditions  for  a  positive-real  function  related  to  the  power  spectrum, 
at  the  origin.  In  some  detail,  if  Rk  =  £{ueUt+k}  as  before,  then  the 
power  spectral  density  /u(0)  is  simply  the  real  part  of  the  positive  real 
function 

F{z)  =  2 -J  =  R<>  +  2RlZ  +  2R2Z 2  •  •  •  • 

Thus,  the  Rk  s  relate  to  the  value  of  F(z)  and  its  derivatives  at  the 
origin.  This  fact  generalizes  to  cases  where  the  statistics  are  taken  at 
the  state  or  output  of  any  dynamical  system.  For  instance,  if  xk  = 
axk_i  +  uk  is  a  first-order  system  and  —1  <  a  <  1  then 
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from  which  we  readily  obtain  an  interpolation  constraint  on  F(z)  at 
z  —  a.  In  general,  intuitively,  superior  resolution  is  achieved  by  select¬ 
ing  data-dependent  interpolation  constraints  at  points  proximal  to  the 
unit-disc  sector  of  a  targeted  frequency  band.  In  general,  the  filter  may 
reflect  dynamics  of  sensors  but  it  can  also  be  virtual,  focusing  on  the  fre¬ 
quency  range  of  interest.  Given  interpolation  constraints  for  the  power 
spectrum,  a  whole  range  of  tools  of  the  nonlinear  methods  [27]  extends 
to  this  framework  (encompassing  so-called  beamspace  techniques  in  an¬ 
tenna  arrays).  The  design  of  input-to-state  filters  and  relevant  tradeoffs 
between  robustness  and  resolution  have  been  addressed  in  32],  and  is 
part  of  continuing  research  and  development  of  algorithms. 

We  now  highlight  the  case  where  the  “Fourier  vector”  is  replaced 
by  a  Green’s/transfer  function  G(eje)  with  no  apparent  shift  structure. 
Turning  once  more  to  the  non-equispaced  antenna  array  in  Figure  3.1, 
we  seek  a  power  density  function  f(6)  consistent  writh  the  statistics 
which  is  closest  to  a  “prior”  /prior  (0)  in  the  sense  of,  say,  a  Kullback- 
Leibler  distance 

S(/||/prior)  :=  ^  J  (/prior  log(/prior)  -  /prior  log(/))  d0. 

The  minimizing  solution  can  be  written  in  closed  form 

/priori) 

IK)  Re{\0G(e>°)} 

where  A0  denotes  a  (row)  vector  of  Lagrange  multipliers  for  the  min¬ 
imization  problem.  These  multipliers  can  be  easily  computed  so  that 
f(8)  abides  by  the  given  statistics,  provided  of  course  that  the  statis¬ 
tics  are  consistent  with  the  structure  of  G(ej6).  A  homotopy  method 
was  proposed  in  [8,  9  leading  to  a  differential  equation  for  A (r)  in  a 
homotopy  variable  r.  If  the  statistics  are  consistent  with  the  structure 
of  G(eje),  then  A(r)  — ►  A0  as  r  — >  1,  otherwise  A (r)  escapes  to  oo.  The 
role  of  /prior  is  to  introduce  prior  information,  but  can  also  be  used  to 
parametrize  all  solutions  to  the  moment  problem,  since  choices  of  /pr ior 
lead  to  the  complete  set  of  /’ s  such  that 

R  =  ^J*  G^e)f{6)G^eYde.  (3.7) 

We  would  like  to  emphasize  that  the  theory  in  [9]  applies  to  the  case 
of  matricial  power  density  functions  fu  (e.g.,  spectral  density  functions 
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of  multi- variable  processes),  as  well  as  to  cases  where  the  support  is 
rnulti-dimensional  (e.g.,  space- time  distributions,  or  6  €  R*  with  I  >  1 
in  general)  in  which  case  the  integrals  are  interpreted  accordingly. 

The  observation  that  singularities  in  a  covariance  matrix  reveal  de¬ 
terministic  linear  dependences  between  observed  quantities,  forms  the 
basis  of  a  wide  range  of  techniques,  from  Gauss’  least  squares,  to  prin¬ 
cipal  component  analysis  (PCA,  GPCA),  to  modern  subspace  methods 
in  time-series  analysis.  This  observation  suggests  that  a  decomposition 
of  covariance  data  into  “signal  4-  noise,”  in  accordance  with  a  suitable 
postulate,  leads  to  identification  of  such  deterministic  dependences. 

We  first  discuss  the  implications  of  the  observation  in  time-series 
analysis.  Here,  traditionally,  one  seeks  a  white-noise  component  of 
maximal  variance  which  is  consistent  with  estimated  statistics.  For  in¬ 
stance,  if  Tn  represents  the  (n  + 1)  x  (n-b  1)  Toeplitz  matrix  formed  out 
of  the  first  n  +  1  autocorrelation  samples  of  a  scalar  random  process, 
the  minimal  eigenvalue  Amin(Tn)  of  Tn  represents  the  maximal  power 
of  white  noise  which  is  consistent  with  this  autocorrelation  data.  Fur¬ 
thermore,  Tn  —  Amin(Tn)J  is  singular  and  corresponds  to  a  determin¬ 
istic  random  process  made  up  of  at  most  n-complex  sinusoidal  com¬ 
ponents.  This  fact  (albeit  in  a  different  language)  was  already  known 
to  Caratheodory  and  Fejer  in  the  early  part  of  the  20th  century,  and 
was  used  by  them  to  show  that  positivity  of  Tn  is  sufficient  for  the 
solution  of  the  relevant  trigonometric  moment  problem.  It  was  recog¬ 
nized  by  Pisarenko  in  the  1960’s  for  its  relevance  in  signal  analysis  and 
this  fact  forms  the  basis  of  certain  widely  used  high  resolution  methods 
for  spectral  analysis  known  as  MUSIC  (Multiple  Signal  Classification) 
and  ESPRIT  (Estimation  of  Parameters  by  Rotational  Invariant  Tech¬ 
niques)  — see  [7,  19,  38]. 

Despite  being  widely  used,  no  multivariable  generalization  of  the 
Caratheodory-Fejcr-Pisarcnko  decomposition  had  been  devised,  until 
the  Pi’s  recent  work  [22]  under  the  current  AFOSR  grant.  In  [22]  we 
have  shown  that  a  direct  multivariable  analog  is  not  possible.  More 
specifically,  we  have  shown  that  after  we  account  for  white  noise  of 
maximal  power  consistent  with  the  data,  the  remaining  variance  can¬ 
not  be  accounted  for  by  pure  sinusoids  (i.e.,  by  a  purely  deterministic 
signal).  Yet,  often  the  “white  noise”  hypothesis  is  suspect.  Further¬ 
more,  in  sensor  arrays  the  hypothesis  of  mutual  couplings  and  local 
effect  of  scatterers  suggests  the  presence  of  noise  with  short  Twinge 
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correlation  structure  (e.g.,  the  analog  of  say,  MA(T)  or  MA{2 )  in 
time-series).  In  an  effort  to  address  such  practical  issues,  in  [22,  11]  we 
develop  canonical  decompositions  of  second  order  statistics  accounting 
for  noise  with  short-range  correlations.  Such  problems  are  naturally 
formulated  as  semi-definite  programs  and  efficiently  solved  with  exist¬ 
ing  software. 

We  wish  to  acknowledge  the  influence  of  early  critique  by  R  E. 
Kalman,  in  the  context  of  econometric  “error-in-variables”  models, 
on  this  line  of  research.  Indeed,  ordinary  least-squares  often  lead  to 
ill-posed  solutions  (as  most  clearly  demonstrated  in  the  econometric 
Frisch- Reiersj61  problem  [29]).  The  motivation  behind  [22]  has  been  to 
address  the  case  where  information  is  available  regarding  noise  statis¬ 
tics  (typified  by  mutual  couplings  and  interference  in  sensor  arrays)  and 
employ  the  system  theoretic  maxim  that  a  maximal  set  of  dependences 
is  to  be  sought.  The  essence  of  our  research  has  been  to  decompose 
second  order  statistics  into  a  sum  which  reflects  noisy  and  determin¬ 
istic  components.  Accordingly,  the  decomposition  is  canonical  and/or 
minimal  in  a  suitable  sense  (see  [22]).  The  formalism  in  [22]  applies  to 
the  setting  of  distributed  sensor  arrays. 

3.2  Distances  between  power  spectra 

Despite  the  centrality  of  spectral  analysis  in  a  wide  range  of  scientific 
disciplines,  no  agreement  exists  as  to  what  an  appropriate  distance 
measure  between  spectral  density  functions  is.  Some  of  the  key  con¬ 
tenders  have  been  the  Bregman  distances,  the  Kullback-Leibler-von 
Neumann  distance,  the  Itakura-Saito  distance,  and  finally  Battachar- 
rya  and  Mahalanobis-type  vaxiants.  Certain  of  these  distances  have  a 
definite  relevance  when  used  to  discriminate  between  two  probability 
density  functions.  Yet  none  has  any  meaningful  interpretation  when 
applied  to  power  spectra. 

We  present  a  new  distance  measure  between  power  spectral  densities 
and  in  fact,  a  (pseudo-)  metric,  which  has  a  clear  interpretation  rooted 
in  prediction  theory.  This  is  based  on  [24,  25]. 

Our  starting  point  is  to  consider  the  degradation  of  the  variance  of 
the  prediction  error  when  the  design  of  the  predictor  is  based  on  the 
wrong  choice  among  two  alternatives.  More  specifically,  let  fi ,  fi  rep¬ 
resent  spectral  density  functions  of  discrete-time  zero-mean  stochastic 
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processes  Uft(k )  (i  G  {1, 2}  and  k  €  Z),  and  let  pj^t)  {£  G  {1,  2, 3, . . .}) 
represent  values  for  the  coefficients  that  minimize  the  linear  prediction 
error  variance 

OO 

£{M<>) -!>(«<*,, (-<)|2}. 

*=1 

Thus,  the  optimal  set  of  coefficients  depends  on  the  power  spectral 
density  function  of  the  process,  a  fact  which  is  acknowledged  by  the 
subscript  in  the  notation  Pft{£).  It  is  reasonable  to  consider  as  a  dis¬ 
tance  between  f\  and  /2  the  degradation  of  predictive  error  variance 
when  the  coefficients  p{£)  are  selected  assuming  one  of  the  two,  and 
then  used  to  predict  a  stochastic  process  corresponding  to  the  other 
spectral  density  function.  The  ratio  of  the  “degraded”  predictive  error 
variance  over  the  optimal  error  variance 


, ,  £{|tt/,(Q)  -  Eg,  (-f)l2} 

■  «{M0) -E£,p/, Mt./, (-<)!’} 

turns  out  to  be  equal  to  the  ratio  of  the  arithmetic  over  the  geomet¬ 
ric  means  of  the  fraction  of  the  two  spectral  density  functions,  see 
[24,  25].  Then,  the  logarithm  logp(/i,/2)  =:  <J(/i, /2)  represents  a 
measure  of  dissimilarity  between  the  “shapes1'  of  /,  and  /2  and,  can  be 
viewed,  as  analogous  to  “divergences”  of  Information  Theory  (such  as 
the  Kullback-Leibler  relative  entropy).  Indeed, 


^(/i./a)  =  log 


(\_  rmde\  i  r 

\2n  J_7rf2(d)2TT  j  2 ttJ_„ 


dd 

27T 


(3.8) 


vanishes  only  when  /1//2  is  constant  on  [ — tt,  tt]  and  is  positive  other¬ 
wise.  Considering  the  distance  6(/,  /  +  A)  between  a  nominal  power 
spectral  density  /  and  a  perturbations  /  +  A,  eliminating  cubic  terms 
and  beyond,  leads  (modulo  a  scaling  factor  of  2)  to  the  Riemannian 
pseudo-metric 


on  density  functions.  It  was  a  pleasant  surprise  that,  geodesic  paths 
fT  (t  E  [0, 1])  connecting  spectral  densities  /o,/i  and  having  minimal 

length  /’  x/^/r./r+dr)  =  fo  \j9fAW)dT,  can  be  explicitly  computed 
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[24].  These  turn  out  to  be  logarithmic  intervals  (also  referred  to  as 
exponential  families), 

Me)  =  flT{Q)fm  for  r  G  [0, 1],  (3.10) 


between  the  two  extreme  points.  Furthermore,  the  length  along  such 
geodesics  can  be  explicitly  computed  in  terms  of  end  points 


d3{fuh) 


-Mi 


“8)' 


dd 

2n 


-  {hi 


log 


W)  dd 

f2{6)  27 r 


(3.11) 

I.e. ,  it  is  the  “standard-deviation”  of  the  difference  log(/i)  —  log(/2). 
This  is  a  pseudo-metric.  The  “pseudo”  refers  only  to  the  fact  that  it 
does  not  account  for  constant  multiplicative  factors. 

It  is  rather  interesting  to  point  out  that  the  /  *•  log(/)  maps  power 

spectral  densities  onto  a  Euclidean  space  where  quadratic  norms  such 
as  (3.11)  have  a  clear  interpretation.  In  fact,  with  respect  to  the  Rie- 
mannian  metric  (3.9)  that  we  introduced,  the  space  has  zero  curvature 
since  geodesics  axe  “logarithmic”  straight  lines.  From  this  vantage 
point  one  may  also  consider  alternative  norms  such  as  ||  log(-~()[|2,  etc. 
though  without  yet  a  natural  interpretation. 

It  is  interesting  to  compare  the  differential  structure  on  power  spec¬ 
tral  density  functions  that  we  introduced  above  with  the  corresponding 
differential  structure  of  “Information  Geometry.”  In  Information  Ge¬ 
ometry  f(0)  corresponds  to  a  probability  density  on  [ — 7r,  7r]  and  the 
natural  Riemannian  metric  is  the  Fisher  information  metric  is  (cf.  1, 
page  28])  which  can  be  expressed  in  our  framework  as 


5Fisher,/  (^) 


_l  r 

2 W-* 


*  A(fl)2  dd 
f(9)  2tt 


(3.12) 


(with  £  =  1  and  h  J-A(0)€  =  0  since  both  />  /  + A  need 

to  be  probability  densities).  Direct  comparison  reveals  that  the  powers 
of  f(9)  in  (3.9)  and  (3.12)  are  different.  Thus,  it  is  curious  and  worth 
underscoring  that  in  either  differential  structure,  geodesics  and  geodesic 
lengths  can  be  computed.  For  completeness  we  note  that  Information 
Geometry  is  a  vast  subject,  originating  in  the  work  of  Rao,  Amari, 
Cencov  and  others,  with  a  large  following  directed  towards  analogous 
geometric  interpretations  in  Quantum  theory.  The  staxting  point  of 
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Information  Geometry  may  be  considered,  in  a  way  analogous  to  our 
development  ,  to  be  the  degradation  of  coding  efficiency  when  the  wrong 
choice  between  two  probability  distributions  /i  and  fi  is  assumed.  This 
degradation  is  precisely  the  Kullbaek-Leibler  distance  between  the  two, 
which  can  then  give  rise  to  the  Riemannian  metric  <?Fisher,/(A),  in  a  way 
analogous  to  our  construction  of  g/( A)  from  5(-,  •). 

The  idea  to  employ  the  degradation  of  performance  with  regard 
to  specific  tasks  when  the  wrong  choice  between  alternatives  is  used 
extends  easily  to  a  variety  of  contexts,  and  should  be  useful  for  that 
purpose  as  well.  An  alternative  paradigm  can  be  built  on  smoothing 
problems  which  we  take  up  next  but  with  a  different  goal  in  mind, 
namely,  to  provide  an  alternative  to  the  maximum  entropy  principle. 

The  maximum  entropy  principle,  as  it  is  often  invoked  in  time-series 
analysis  ([28]),  suggests  the  selection  of  a  power  spectrum  which  is  con¬ 
sistent  with  autocorrelation  data  and  corresponds  to  a  random  process 
least  predictable  from  past  observations.  While  this  is  a  reasonable  dic¬ 
tum  when  one  is  interested  in  prediction,  it  is  often  used  regardless  of 
the  specific  intent  for  the  sought  spectrum.  The  point  we  wish  to  raise 
becomes  apparent  when  considering  the  relevance  of  another  dictum, 
equally  pertinent,  albeit  based  on  smoothing  instead  of  prediction. 

The  variance  £{|u(0)  —  Y^iPfWu(~ ^)|2}  of  the  optimal  one-step- 
ahead  (linear)  predictor  {t(0|past)  :=  is  the  geometric 

mean  (see  [26,  page  183])  of  /,  i.e., 

£{|u(0)  -  ft(0|past)|2}  =  mo,/  :=  exp  J  log  (f(9))  d6 

This  is  the  content  of  the  celebrated  Kolmogorov-Szego  theorem.  The 
entropy  rate  [27]  is  then  defined  as  the  negative  integral  of  the  logarithm 
of  /  (i.e.,  as  —  /  log (f(8))dd).  The  notation  m0>/,  taken  from  [2,  page 
23]  for  the  geometric  mean ,  is  sought  to  contrast  with  the  expression 
for  the  variance  of  the  error 

5{|xx(0)  ~  {t(0|past  +  future)|2}  =  :=  J  f(9)~ldd 

for  the  optimal  smoothing  filter  xi(0| past  +  future)  :=  £). 

This  expression  has  been  derived  in  [23]  where  it  was  also  noted  that 
it  represents  the  harmonic  mean  of  /.  (Naturally,  the  harmonic  mean 
is  always  <  to  the  geometric  mean,  since  smoothing  uses  more  data.) 
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Applications  abound  where  records  need  to  be  interpolated,  or  where 
the  indexing  of  data  collected  via  a  sensor  array  represents  spatial 
ordering  and  not  time  ordering.  In  all  such  applications  there  is  no 
natural  “time-arrow”  and,  hence,  it  is  imperative  that  Burg’s  maximum 
entropy  principle  is  re-evaluated. 

Thus,  in  the  context  of  time  series  analysis,  both  Burg’s  “pre¬ 
dictive  entropy”  —  f  \og(f(9))dd  and  the  “smoothing  entropy” 

/  f(0)~ld6  derived  in  [23]  in  work  supported  by  the  present  grant, 
relate  to  the  level  of  unpredictability  in  these  two  vastly  different  sit¬ 
uations.  Burg’s  entropy  has  been  also  used  as  a  regularizing  func¬ 
tional  in  inverse  problems  (see  [9]).  But  the  latter  functional  can  be 
used  equally  well  for  similar  modeling  purposes.  For  instance,  we  have 
shown  in  [23]  that  extremal  spectra  with  respect  to  the  second  choice 
give  rise  to  all-pole  Markovian  models  very  much  like  Burg’s  maximum 
entropy  AR-models,  but  with  one  important  difference.  The  poles  in 
these  models  appear  with  fractional  powers.  Such  fractional  powers  are 
often  encountered  in  processes  with  long  “memory.” 

There  is  an  apparent  dichotomy  between  what  a  deterministic  pro¬ 
cess  is,  depending  on  whether  we  consider  a  one-sided  or  a  two-sided 
past.  Stationary  time-series  are  said  to  be  deterministic  in  the  Kol¬ 
mogorov  sense  if  log(/)  £  L\.  When  we  consider  determinism  with 
respect  to  a  two-sided  past,  then  the  corresponding  condition  weakens 
to  f~l  £  Li,  because  it  is  only  then  that  the  smoothing  error  is  zero. 
This  dichotomy  raises  similar  questions  for  spatial  processes  and  fields. 
This  is  especially  pertinent  for  applications  where  space-time  data  are 
collected  via  sensor  arrays. 

Ever  since  the  early  days  of  statistical  mechanics,  entropy  has  been 
a  very  elusive  concept.  Yet,  from  a  mathematical  and  computational 
standpoint,  entropy  and  entropy-rate  functionals  can  be  thought  of  as 
natural  barriers  of  convex  sets  and  positive  cones  (i.e.,  of  probability 
simplices,  or  of  cones  of  spectral  density  functions).  They  thus  can  be 
used  to  identify  solutions  to  ill-posed  inverse  problems.  The  history  of 
such  a  viewpoint  and  of  earlier  developments  can  be  looked  at  in  the 
Pi’s  recent  publication  [9].  Besides  the  usual  entropy  functionals  intro¬ 
duced  by  Shannon-von  Neuman,  Burg,  and  Kullback-Leibler-Linblad- 
Leib,  discussed  in  [9]  there  is  a  plethora  of  alternatives,  such  as  the 
Renyi  entropy,  Tsallis  entropy,  and  more  recently  generalized  means  in 
the  Pi’s  work  [24].  This  last  work  provides,  as  discussed  earlier,  notion 
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of  distance  which  are  compatible  with  prediction  problems  and  are 
suitable  for  incorporating  uncertainty  into  correlation  measurements 
(in  the  spirit  of  [8]). 

3.3  Software 

The  following  is  only  a  brief  overview  of  representative  routines  used 
for  high  resolution  spectral  analysis  of  scalar  time-series.  Additional 
information  and  Matlab  code  is  provided  at 
http://www.ece.umn.edu/users/gcorgiou/files/reports.html. 

Please  contact  the  principal  investigator  at  tryphonumn.edu  for  com¬ 
ments/input  and  for  subsequent  releases  of  Mat  lab- based  code  for  high 
resolution  spectral  analysis.  We  briefly  explain  how  the  software  we 
developed  can  be  used  to  resolve  sinusoids  — a  benchmark  problem 
We  begin  we  time-domain  data  and  some  prior  information  as  to  the 
frequency  range  of  interest.  A  filter-bank  (one  input  many  outputs)  is 
then  selected  with  a  bandpass  characteristic  over  the  frequency  range 
of  interest.  It  consists  of  a  dynamical  system 

Xk+i  =  Axk  +  Buk 

with  A ,  B  matrices  of  size  nxn,  and  nxm  respectively.  Of  course,  when 
the  time-series  Uk  is  scalar,  m  =  1.  The  time-series  is  considered  at 
present  to  have  zero  mean  and  is  adjusted  accordingly.  An  observation 
record 

{ui,u2,  .  .  .  ,u;v} 

is  typically  available,  and  on  the  basis  of  that  an  estimate  of  the  state- 
covariance 

P  =  E{xkx*k} 

is  obtained  using  routine  dlsim. complex .  m.  Relevant  theory  and  a 
number  of  routines,  e.g.,  the  ones  below, 


Name 

usage 

sm.m 

[f  r_lines ,  ampl.lines]  =sm(P ,  A ,  B ,  k) 

me.m 

me_spect=me (P , A , B , omega) 

envlp . m 

env=envlp (P , A , B , omega , noi selevel ) 

http. //www.  ece.umn.edu/users/georgiou/files/reports.html. 

can  be  used  for  spectral  analysis.  The  examples  above  determine  (a) 
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spectral  lines  consistent  with  the  data  (sm.m),  (b)  a  candidate  spectrum 
for  Uk  which  is  consistent  with  the  data  P  and  is  of  maximal  entropy 
(me.m),  and  (c)  an  envelop  for  the  amplitude  of  all  consistent  with  the 
data  spectral  lines  (envlp.m). 

The  resolution  of  the  above  routines  strongly  depends  on  the  choice 
of  A ,  B  and  on  the  variance  of  the  estimator  for  the  state  covariance 
P.  Tradeoffs  between  robustness  and  resolution  using  such  methods 
is  the  subject  of  a  Ph.D.  thesis  by  A.Nasiri-Amini  (December  2005) 
and  discussed  in  [32].  These  provide  guidelines  for  optimal  design  of 
input-to- state  filters  and  theoretical  bounds  for  the  expected  gains  in 
resolution. 

In  practice,  as  a  rule  of  thumb,  there  are  two  parameters  that  dictate 
the  performance  of  the  relevant  spectral  estimators:  the  time-constant 
of  the  input-to- state  filter 

G{z)  =  (zl  -A)~lB 

and  its  bandpass  character.  Routines  cjordan2.m  and  mjordan.m  can 
be  used  for  designing  suitable  ( A ,  B)  pairs  for  scalar  and  vectorial  time- 
series,  respectively.  Typically  one  needs  only  specify  a  (complex)  eigen¬ 
value^)  for  A  and  the  size  of  the  corresponding  Jordan  block(s).  The 
modulus  of  the  eigenvalues  dictates  the  time-constant  of  G(z )  and  the 
phase  specifies  the  band  pass  character.  Finally,  the  pair  is  then  nor¬ 
malized  to  satisfy 

AA*  +  BB*  =  I 

where  /  is  the  identity,  for  numerical  reasons.  Typically,  A  can  be 
chosen  to  have  one  Jordan  block  (when  Uk  is  scalar)  or  as  a  Kronecker 
product  of  such  a  matrix  with  the  identity  (as  in  mjordan.m). 

Routine  demol.m  exemplifies  the  performance  of  the  above  for  an 
academic  example  of  separating  two  sinusoids  in  background  noise.  Be¬ 
cause  of  the  band-pass  character  of  G(s )  and  the  fact  that  the  frame¬ 
work  relies  on  the  state-covariance  of  G(s),  the  performance  of  all  the 
above  is  impervious  to  color  noise  (as  long  as  it  is  relatively  white  over 
the  passband  of  G(s)).  Yet,  in  the  example  we  use  white  noise  for 
simplicity.  Figure  3.2  displays  a  typical  output.  The  “true”  spectrum 
of  the  time-series  u ^  is  represented  by  the  red  “noise  level”  and  two 
red  arrows  for  the  sinusoids  at  frequencies  1.3  and  1.35  rad/sec].  The 
subplots  on  the  right  represent  “zoom-in”  of  those  on  the  left,  focusing 
on  a  frequency  range  of  interest  [1.2, 1.5].  The  data  record  contains 
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only  100  points.  The  fourier-based  reconstruction  is  shown  in  green  in 
subplots  (2,1)  and,  magnified  in  (2,2).  It  is  apparent  that  the  length 
of  the  data  record,  which  is  very  short  compared  to  the  separation  of 
the  two  spectral  lines,  does  not  permit  periodogram  based  techniques 
to  achieve  any  level  of  resolution.  Our  high  resolutions  methods  pro¬ 
vide  very  accurate  spectral  estimates.  These  are  shown  in  blue.  In 
particular,  estimated  spectral  lines  are  indicated  with  blue  arrows  and 
are  compared  with  the  actual  spectral  lines  of  the  signal  in  subplots 
(1,1)  and  in  (1,2).  Subplots  (3,1)  and  (3,2)  show  the  computed  envelop 
that  bounds  all  power  spectra  which  axe  consistent  with  the  estimated 
generalized  statistics.  For  comparison  we  mark  the  position  of  the 
actual  frequencies  of  the  two  sinusoids  with  thin  red  lines.  Finally, 
subplots  (4,1)  and  (4,2)  show  the  maximum  entropy  power  spectrum 
which  is  consistent  with  the  generalized  statistics.  These  three  alter¬ 
natives  (blue  arrows  in  the  first  row,  and  blue  envelop  and  spectrum  in 
rows  3  and  4)  have  been  produced  with  the  above  routines.  They  indi¬ 
cate  a  remarkable  consistency  and  accuracy  which  has  been  confirmed 
by  theoretical  calculations  and  error  bounds  ([32],  [31],  [33]). 
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Figure  3.2:  Original  and  reconstructed  power  spectra 
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